8 research outputs found

    Integrated mining of feature spaces for bioinformatics domain discovery

    Get PDF
    One of the major challenges in the field of bioinformatics is the elucidation of protein folding for the functional annotation of proteins. The factors that govern protein folding include the chemical, physical, and environmental conditions of the protein\u27s surroundings, which can be measured and exploited for computational discovery purposes. These conditions enable the protein to transform from a sequence of amino acids to a globular three-dimensional structure. Information concerning the folded state of a protein has significant potential to explain biochemical pathways and their involvement in disorders and diseases. This information impacts the ways in which genetic diseases are characterized and cured and in which designer drugs are created. With the exponential growth of protein databases and the limitations of experimental protein structure determination, sophisticated computational methods have been developed and applied to search for, detect, and compare protein homology. Most computational tools developed for protein structure prediction are primarily based on sequence similarity searches. These approaches have improved the prediction accuracy of high sequence similarity proteins but have failed to perform well with proteins of low sequence similarity. Data mining offers unique algorithmic computational approaches that have been used widely in the development of automatic protein structure classification and prediction. In this dissertation, we present a novel approach for the integration of physico-chemical properties and effective feature extraction techniques for the classification of proteins. Our approaches overcome one of the major obstacles of data mining in protein databases, the encapsulation of different hydrophobicity residue properties into a much reduced feature space that possess high degrees of specificity and sensitivity in protein structure classification. We have developed three unique computational algorithms for coherent feature extraction on selected scale properties of the protein sequence. When plagued by the problem of the unequal cardinality of proteins, our proposed integration scheme effectively handles the varied sizes of proteins and scales well with increasing dimensionality of these sequences. We also detail a two-fold methodology for protein functional annotation. First, we exhibit our success in creating an algorithm that provides a means to integrate multiple physico-chemical properties in the form of a multi-layered abstract feature space, with each layer corresponding to a physico-chemical property. Second, we discuss a wavelet-based segmentation approach that efficiently detects regions of property conservation across all layers of the created feature space. Finally, we present a unique graph-theory based algorithmic framework for the identification of conserved hydrophobic residue interaction patterns using identified scales of hydrophobicity. We report that these discriminatory features are specific to a family of proteins, which consist of conserved hydrophobic residues that are then used for structural classification. We also present our rigorously tested validation schemes, which report significant degrees of accuracy to show that homologous proteins exhibit the conservation of physico-chemical properties along the protein backbone. We conclude our discussion by summarizing our results and contributions and by listing our goals for future research

    Financial Malware Detect With Job Anomaly

    Get PDF
    It is well-known that financial frauds, such as money laundering, also facilitate terrorism or other illegal activity. A lot of this kind of this kind of illicit dealings entails a complicated trading and financial exchange, and that makes it impossible to uncover the frauds. Additionally, dynamic financial networks and features can be leveraged for trading. The trading network shows the relationship between organizations, thereby allowing investigators to identify fraudulent activity; while entity features filter out fraudulent behavior. Thus, the characteristics of the network and characteristics include knowledge that has the ability to enhance fraud identification. However, most of the current approaches operate on either networks or content. In this study, we propose a novel approach, dubbed CoDetect, that capitalizes on network and feature details. Another excellent aspect of the CoDetect is that it is able to simultaneously track both financial transactions and patterns of fraud. Extensive laboratory testing on both synthetic evidence and actual cases demonstrates the framework's capacity to tackle financial fraud

    Identify Credit Tag Scheme Using Enhance And The Bulk Of Votes

    Get PDF
    In financial services, credit card theft is a major concern. Thousands of dollars are lost per year because of credit card theft. Research reports on the analysis of credit card data from the real world are lacking due to problems with secrecy. The paper is used to diagnose credit card fraud using machine learning algorithms. First of all, standard versions are included. Hybrid procedures are then used using AdaBoost and plurality voting methods. A public credit card data collection is used to test the efficiency of the model. An analysis of a financial institution's own credit card records is then conducted. In order to better evaluate the robustness of the algorithms, noise is applied to the samples. The experimental findings show that the plurality vote system has strong rates of accuracy in the detection of cases of fraud on credit cards

    Data mining for bioinformatics

    No full text
    xix, 328 p. ; 24 c

    Wavelet-based energy features for glaucomatous image classification

    No full text
    Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods

    Use of Nonlinear Features for Automated Characterization of Suspicious Ovarian Tumors Using Ultrasound Images in Fuzzy Forest Framework

    No full text
    Ovarian cancer is one of the prime causes of mortality in women. Diagnosis of ovarian cancer using ultrasonography is tedious as ovarian tumors exhibit minute clinical and structural differences between the suspicious and non-suspicious classes. Early prediction of ovarian cancer will reduce its growth rate and may save many lives. Computer-aided diagnosis (CAD) is a noninvasive method for finding ovarian cancer in its early stage which can avoid patient anxiety and unnecessary biopsy. This study investigates the efficacy of a novel CAD tool to characterize suspicious ovarian cancer using Radon-transformed nonlinear features. The obtained dimension of the extracted features is reduced using Relief-F feature selection method. In this study, we have employed the fuzzy forest-based ensemble classifier in contrast to the known crisp rule-based classifiers. The proposed method is evaluated using 469 (non-suspicious: 238, suspicious: 231) subjects and achieved a maximum 80.60 ± 0.5% accuracy, 81.40% sensitivity, 76.30% specificity with fuzzy forest, an ensemble fuzzy classifier using thirty-nine features. The proposed method is robust and reproducible as it uses maximum number subjects (469) as compared to state-of-the-art techniques. Hence, it can be used as an assisting tool by gynecologists during their routine screening
    corecore